%matplotlib inline
import seaborn as sns
import pandas as pd
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
Sub-question:
In order to uderstand the average movements distriution per area in Netherlands, so this section, the geometry information of Netherlands has been extracted and cleaned so that it can be used by package geopandas. Moreover, this would help figure out the area where the people's trips are influenced by Covid19 most. For the more detailed data, you can check notebook Simmon's Visualization.
#Read the geo information of Netherlands
lsoas = gpd.read_file('gadm41_NLD_1.shp')
df1=lsoas.drop(index=[5,12])#This part I drop the water body area of Netherlands where there is no data available
#Plot the map of Netherlands
fig,ax=plt.subplots(1,2,figsize=(8,8))
lsoas.plot(ax=ax[0])
df1.plot(ax=ax[1],color='r',edgecolor='w')#exlude the waterbody areas
a=lsoas.iloc[13:14,:]
a.plot(color='yellow',ax=ax[0])
b=lsoas.loc[[5,12]]
b.plot(ax=ax[0],color='green')#this is the water areas in green
<AxesSubplot: >
The right red map is the one that we gonna use in visualization
This section we manage the dataset obtained from CBS to show the number of average movements.
#Load the orginal dataset of average movements
data=pd.read_csv('Movements per person per year.csv')
##################
#Data cleaning
#The index of dataframe
a=['year',
'Total',
'Passenger car (driver)',
'Passenger car (passenger)',
'Train',
'Bus/tram/metro',
'Bicycle',
'To walk',
'Other mode of transport']
#Define a function to obtain data for each region
def Obtain(b):
y=data.loc[[0,2,3,4,5,6,7,8,9],[i for i in data.columns[b:b+4]]]
y=y.rename(columns={y.columns[1]:y.columns[0], y.columns[2]:y.columns[0], y.columns[3]:y.columns[0]})
y.index=pd.Series(a)
# y=y.reindex(index=a,columns=[y.columns[0]]*4)
return y
#For example, obtain data for the whole netherlands
test=Obtain(2)
#Then extract the data we want to use and combine them
for i in range(6,54,4):
m=Obtain(i)
test=test.join(m)
#Obtain data structure that can be used by plotly.express
u=test.iloc[2:,:]
u.columns=pd.Series([i for i in test.iloc[0,:]])#derive the year as the index
u
b=u.stack().reset_index()
#Change the column name into what we want
b.columns=pd.Series(['type','year','number of'])
#Extract manually the province in order we need in plotly.express from the dataframe
Province=[]
for i in range(7):
for j in test.columns:
Province.append(j)
#Create a new column that represents the regions
b['Region']=Province
#Change the data type into float that can be compared or calucalated
b['number of']=pd.to_numeric(b['number of'],errors='coerce')
b=b.fillna(0)
#plot the bar grap for each region with animation in regions
px.bar(b,x='year',y='number of',animation_frame='Region',color='type',title='average trips per person per year')
An obvious drop for the average trips per person per region in Netherlands can be found in 2020 when the serious lockdown regulation was published by government.
#Now derive the dataframe used for shapefile(2020 only, see the distribution of trips in Netherlands during COVID 19)
u=test.iloc[1:,:]
u.columns=pd.Series([i for i in test.iloc[0,:]])#derive the year as the index
u
b=u.stack().reset_index()
b.columns=pd.Series(['type','year','number of'])
b
Province=[]
for i in range(8):
for j in test.columns:
Province.append(j)
b['Region']=Province
#Change the index of regions into Dutch to match the orginal SHP file
Geo=b.set_index('Region').rename(index={'North Brabant':'Noord-Brabant','North Holland':'Noord-Holland','Zealand':'Zeeland'})
#Change the data type from str to int that is readable by python
Geo['number of']=Geo['number of'].astype('int')
Geo=Geo.reset_index()
#Plot the data per region for each year on the map to see the distribution
year=[2018,2019,2020,2021]
f,ax=plt.subplots(4,1,figsize=(50,60))
for i in range(4):
G=Geo[(Geo['type']=='Total')&(Geo['year']==str(year[i]))&(Geo['Region']!='The Netherlands')].set_index('Region')
DATA=df1.join(G, on='NAME_1')
DATA.plot(column='number of',scheme='equal_interval', k=6, alpha=1,
edgecolor='w', linewidth=1,legend=True,ax=ax[i])
ax[i].set_title(f'Distribution of average Trips of residence per year in Netherlands during COVID19({year[i]})')
The detailed map of Netherlands including the name of each province is shown below.

We display the map with distribution of average trips per person of each region of Netherlands from 2018 to 2021 to see if some regions are affected by the COVID 19 heavily (see Figures above). The Geo and shapefile data of Netherlands are extracted from gadm.org and they are also cleaned before being applied to plot.
Based on the maps, we found the regions Drenthe, Overijssel and Gelderland(yellow regions in map) are the areas where people are more willing to move around compared with people from other regions. Drenthe is the area where the residence's trips are influenced by the COVID 19 lockdown most as the value range that they belong to change from 1010.83-1027 to 866-883.33(as well as color representation changing from yellow to green), which means a sort of downgrading. But the average trips in Friesland is interesting to be found that it increased a bit during pandemic.